Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 213
Filtrar
1.
medRxiv ; 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38559045

RESUMO

Importance: Diagnostic errors are common and cause significant morbidity. Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves diagnostic reasoning. Objective: To assess the impact of the GPT-4 LLM on physicians' diagnostic reasoning compared to conventional resources. Design: Multi-center, randomized clinical vignette study. Setting: The study was conducted using remote video conferencing with physicians across the country and in-person participation across multiple academic medical institutions. Participants: Resident and attending physicians with training in family medicine, internal medicine, or emergency medicine. Interventions: Participants were randomized to access GPT-4 in addition to conventional diagnostic resources or to just conventional resources. They were allocated 60 minutes to review up to six clinical vignettes adapted from established diagnostic reasoning exams. Main Outcomes and Measures: The primary outcome was diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps. Secondary outcomes included time spent per case and final diagnosis. Results: 50 physicians (26 attendings, 24 residents) participated, with an average of 5.2 cases completed per participant. The median diagnostic reasoning score per case was 76.3 percent (IQR 65.8 to 86.8) for the GPT-4 group and 73.7 percent (IQR 63.2 to 84.2) for the conventional resources group, with an adjusted difference of 1.6 percentage points (95% CI -4.4 to 7.6; p=0.60). The median time spent on cases for the GPT-4 group was 519 seconds (IQR 371 to 668 seconds), compared to 565 seconds (IQR 456 to 788 seconds) for the conventional resources group, with a time difference of -82 seconds (95% CI -195 to 31; p=0.20). GPT-4 alone scored 15.5 percentage points (95% CI 1.5 to 29, p=0.03) higher than the conventional resources group. Conclusions and Relevance: In a clinical vignette-based study, the availability of GPT-4 to physicians as a diagnostic aid did not significantly improve clinical reasoning compared to conventional resources, although it may improve components of clinical reasoning such as efficiency. GPT-4 alone demonstrated higher performance than both physician groups, suggesting opportunities for further improvement in physician-AI collaboration in clinical practice.

3.
Nat Immunol ; 25(4): 644-658, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38503922

RESUMO

The organization of immune cells in human tumors is not well understood. Immunogenic tumors harbor spatially localized multicellular 'immunity hubs' defined by expression of the T cell-attracting chemokines CXCL10/CXCL11 and abundant T cells. Here, we examined immunity hubs in human pre-immunotherapy lung cancer specimens and found an association with beneficial response to PD-1 blockade. Critically, we discovered the stem-immunity hub, a subtype of immunity hub strongly associated with favorable PD-1-blockade outcome. This hub is distinct from mature tertiary lymphoid structures and is enriched for stem-like TCF7+PD-1+CD8+ T cells, activated CCR7+LAMP3+ dendritic cells and CCL19+ fibroblasts as well as chemokines that organize these cells. Within the stem-immunity hub, we find preferential interactions between CXCL10+ macrophages and TCF7-CD8+ T cells as well as between mature regulatory dendritic cells and TCF7+CD4+ and regulatory T cells. These results provide a picture of the spatial organization of the human intratumoral immune response and its relevance to patient immunotherapy outcomes.


Assuntos
Neoplasias Pulmonares , Humanos , Linfócitos T CD8-Positivos , Receptor de Morte Celular Programada 1 , Quimiocinas/metabolismo , Imunoterapia/métodos , Microambiente Tumoral
4.
NPJ Digit Med ; 7(1): 20, 2024 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-38267608

RESUMO

One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the "black box" limitations of LLMs, bringing them one step closer to safe and effective use in medicine.

6.
J Hepatol ; 80(2): 251-267, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36972796

RESUMO

BACKGROUND & AIMS: Chronic viral infections present serious public health challenges; however, direct-acting antivirals (DAAs) are now able to cure nearly all patients infected with hepatitis C virus (HCV), representing the only cure of a human chronic viral infection to date. DAAs provide a valuable opportunity to study immune pathways in the reversal of chronic immune failures in an in vivo human system. METHODS: To leverage this opportunity, we used plate-based single-cell RNA-seq to deeply profile myeloid cells from liver fine needle aspirates in patients with HCV before and after DAA treatment. We comprehensively characterised liver neutrophils, eosinophils, mast cells, conventional dendritic cells, plasmacytoid dendritic cells, classical monocytes, non-classical monocytes, and macrophages, and defined fine-grained subpopulations of several cell types. RESULTS: We discovered cell type-specific changes post-cure, including an increase in MCM7+STMN1+ proliferating CD1C+ conventional dendritic cells, which may support restoration from chronic exhaustion. We observed an expected downregulation of interferon-stimulated genes (ISGs) post-cure as well as an unexpected inverse relationship between pre-treatment viral load and post-cure ISG expression in each cell type, revealing a link between viral loads and sustained modifications of the host's immune system. We found an upregulation of PD-L1/L2 gene expression in ISG-high neutrophils and IDO1 expression in eosinophils, pinpointing cell subpopulations crucial for immune regulation. We identified three recurring gene programmes shared by multiple cell types, distilling core functions of the myeloid compartment. CONCLUSIONS: This comprehensive single-cell RNA-seq atlas of human liver myeloid cells in response to cure of chronic viral infections reveals principles of liver immunity and provides immunotherapeutic insights. CLINICAL TRIAL REGISTRATION: This study is registered at ClinicalTrials.gov (NCT02476617). IMPACT AND IMPLICATIONS: Chronic viral liver infections continue to be a major public health problem. Single-cell characterisation of liver immune cells during hepatitis C and post-cure provides unique insights into the architecture of liver immunity contributing to the resolution of the first curable chronic viral infection of humans. Multiple layers of innate immune regulation during chronic infections and persistent immune modifications after cure are revealed. Researchers and clinicians may leverage these findings to develop methods to optimise the post-cure environment for HCV and develop novel therapeutic approaches for other chronic viral infections.


Assuntos
Hepatite C Crônica , Hepatite C , Humanos , Antivirais/uso terapêutico , Infecção Persistente , Hepatite C/tratamento farmacológico , Hepacivirus/genética
7.
Lancet Digit Health ; 6(1): e70-e78, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38065778

RESUMO

BACKGROUND: Preoperative risk assessments used in clinical practice are insufficient in their ability to identify risk for postoperative mortality. Deep-learning analysis of electrocardiography can identify hidden risk markers that can help to prognosticate postoperative mortality. We aimed to develop a prognostic model that accurately predicts postoperative mortality in patients undergoing medical procedures and who had received preoperative electrocardiographic diagnostic testing. METHODS: In a derivation cohort of preoperative patients with available electrocardiograms (ECGs) from Cedars-Sinai Medical Center (Los Angeles, CA, USA) between Jan 1, 2015 and Dec 31, 2019, a deep-learning algorithm was developed to leverage waveform signals to discriminate postoperative mortality. We randomly split patients (8:1:1) into subsets for training, internal validation, and final algorithm test analyses. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values in the hold-out test dataset and in two external hospital cohorts and compared with the established Revised Cardiac Risk Index (RCRI) score. The primary outcome was post-procedural mortality across three health-care systems. FINDINGS: 45 969 patients had a complete ECG waveform image available for at least one 12-lead ECG performed within the 30 days before the procedure date (59 975 inpatient procedures and 112 794 ECGs): 36 839 patients in the training dataset, 4549 in the internal validation dataset, and 4581 in the internal test dataset. In the held-out internal test cohort, the algorithm discriminates mortality with an AUC value of 0·83 (95% CI 0·79-0·87), surpassing the discrimination of the RCRI score with an AUC of 0·67 (0·61-0·72). The algorithm similarly discriminated risk for mortality in two independent US health-care systems, with AUCs of 0·79 (0·75-0·83) and 0·75 (0·74-0·76), respectively. Patients determined to be high risk by the deep-learning model had an unadjusted odds ratio (OR) of 8·83 (5·57-13·20) for postoperative mortality compared with an unadjusted OR of 2·08 (0·77-3·50) for postoperative mortality for RCRI scores of more than 2. The deep-learning algorithm performed similarly for patients undergoing cardiac surgery (AUC 0·85 [0·77-0·92]), non-cardiac surgery (AUC 0·83 [0·79-0·88]), and catheterisation or endoscopy suite procedures (AUC 0·76 [0·72-0·81]). INTERPRETATION: A deep-learning algorithm interpreting preoperative ECGs can improve discrimination of postoperative mortality. The deep-learning algorithm worked equally well for risk stratification of cardiac surgeries, non-cardiac surgeries, and catheterisation laboratory procedures, and was validated in three independent health-care systems. This algorithm can provide additional information to clinicians making the decision to perform medical procedures and stratify the risk of future complications. FUNDING: National Heart, Lung, and Blood Institute.


Assuntos
Aprendizado Profundo , Humanos , Medição de Risco/métodos , Algoritmos , Prognóstico , Eletrocardiografia
8.
J Rheumatol ; 51(3): 297-304, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38101917

RESUMO

OBJECTIVE: The aim of this study was to investigate and compare different case definitions for chronic pain to provide estimates of possible misclassification when researchers are limited by available electronic health record and administrative claims data, allowing for greater precision in case definitions. METHODS: We compared the prevalence of different case definitions for chronic pain (N = 3042) in patients with autoimmune rheumatic diseases. We estimated the prevalence of chronic pain based on 15 unique combinations of pain scores, diagnostic codes, analgesic medications, and pain interventions. RESULTS: Chronic pain prevalence was lowest in unimodal pain phenotyping algorithms: 15% using analgesic medications, 18% using pain scores, 21% using pain diagnostic codes, and 22% using pain interventions. In comparison, the prevalence using a well-validated phenotyping algorithm was 37%. The prevalence of chronic pain also increased with the increasing number (bimodal to quadrimodal) of phenotyping algorithms that comprised the multimodal phenotyping algorithms. The highest estimated chronic pain prevalence (47%) was the multimodal phenotyping algorithm that combined pain scores, diagnostic codes, analgesic medications, and pain interventions. However, this quadrimodal phenotyping algorithm yielded a 10% overestimation of chronic pain compared to the well-validated algorithm. CONCLUSION: This is the first empirical study to our knowledge that shows that established common modes of phenotyping chronic pain can lead to substantially varying estimates of the number of patients with chronic pain. These findings can be a reference for biases in case definitions for chronic pain and could be used to estimate the extent of possible misclassifications or corrections in using datasets that cannot include specific data elements.


Assuntos
Doenças Autoimunes , Dor Crônica , Reumatologia , Humanos , Dor Crônica/diagnóstico , Dor Crônica/epidemiologia , Registros Eletrônicos de Saúde , Algoritmos , Analgésicos
9.
Pac Symp Biocomput ; 29: 1-7, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160265

RESUMO

Artificial Intelligence (AI) models are substantially enhancing the capability to analyze complex and multi-dimensional datasets. Generative AI and deep learning models have demonstrated significant advancements in extracting knowledge from unstructured text, imaging as well as structured and tabular data. This recent breakthrough in AI has inspired research in medicine, leading to the development of numerous tools for creating clinical decision support systems, monitoring tools, image interpretation, and triaging capabilities. Nevertheless, comprehensive research is imperative to evaluate the potential impact and implications of AI systems in healthcare. At the 2024 Pacific Symposium on Biocomputing (PSB) session entitled "Artificial Intelligence in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface", we spotlight research that develops and applies AI algorithms to solve real-world problems in healthcare.


Assuntos
Inteligência Artificial , Medicina Clínica , Humanos , Biologia Computacional , Algoritmos
10.
medRxiv ; 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-38076944

RESUMO

In a randomized, pre-post intervention study, we evaluated the influence of a large language model (LLM) generative AI system on accuracy of physician decision-making and bias in healthcare. 50 US-licensed physicians reviewed a video clinical vignette, featuring actors representing different demographics (a White male or a Black female) with chest pain. Participants were asked to answer clinical questions around triage, risk, and treatment based on these vignettes, then asked to reconsider after receiving advice generated by ChatGPT+ (GPT4). The primary outcome was the accuracy of clinical decisions based on pre-established evidence-based guidelines. Results showed that physicians are willing to change their initial clinical impressions given AI assistance, and that this led to a significant improvement in clinical decision-making accuracy in a chest pain evaluation scenario without introducing or exacerbating existing race or gender biases. A survey of physician participants indicates that the majority expect LLM tools to play a significant role in clinical decision making.

12.
Am J Infect Control ; 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-37972820

RESUMO

BACKGROUND: While airborne transmission of rhinovirus is recognized in indoor settings, its role in hospital transmission remains unclear. METHODS: We investigated an outbreak of rhinovirus in a pediatric intensive care unit (PICU) to assess air dispersal. We collected clinical, environmental, and air samples, and staff's surgical masks for viral load and phylogenetic analysis. Hand hygiene compliance and the number of air changes per hour in the PICU were measured. A case-control analysis was performed to identify nosocomial rhinovirus risk factors. RESULTS: Between March 31, 2023, and April 2, 2023, three patients acquired rhinovirus in a cubicle (air changes per hour: 14) of 12-bed PICU. A portable air-cleaning unit was placed promptly. Air samples (72,000 L in 6 hours) from the cohort area, and outer surfaces of staff's masks (n = 8), were rhinovirus RNA-negative. Hand hygiene compliance showed no significant differences (31/34, 91.2% vs 33/37, 89.2%, P = 1) before and during outbreak. Only 1 environmental sample (3.8%) was positive (1.86 × 103 copies/mL). Case-control and next-generation sequencing analysis implicated an infected staff member as the source. CONCLUSIONS: Our findings suggest that air dispersal of rhinovirus was not documented in the well-ventilated PICU during the outbreak. Further research is needed to better understand the dynamics of rhinovirus transmission in health care settings.

13.
Vaccines (Basel) ; 11(11)2023 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-38006044

RESUMO

Seasonal influenza is a leading cause of death in the U.S., causing significant morbidity, mortality, and economic burden. Despite the proven efficacy of vaccinations, rates remain notably low, especially among Medicaid enrollees. Leveraging Medicaid claims data, this study characterizes influenza vaccination rates among Medicaid enrollees and aims to elucidate factors influencing vaccine uptake, providing insights that might also be applicable to other vaccine-preventable diseases, including COVID-19. This study used Medicaid claims data from nine U.S. states (2016-2021], encompassing three types of claims: fee-for-service, major Medicaid managed care plan, and combined. We included Medicaid enrollees who had an in-person healthcare encounter during an influenza season in this period, excluding those under 6 months of age, over 65 years, or having telehealth-only encounters. Vaccination was the primary outcome, with secondary outcomes involving in-person healthcare encounters. Chi-square tests, multivariable logistic regression, and Fisher's exact test were utilized for statistical analysis. A total of 20,868,910 enrollees with at least one healthcare encounter in at least one influenza season were included in the study population between 2016 and 2021. Overall, 15% (N = 3,050,471) of enrollees received an influenza vaccine between 2016 and 2021. During peri-COVID periods, there was an increase in vaccination rates among enrollees compared to pre-COVID periods, from 14% to 16%. Children had the highest influenza vaccination rates among all age groups at 29%, whereas only 17% were of 5-17 years, and 10% were of the 18-64 years were vaccinated. We observed differences in the likelihood of receiving the influenza vaccine among enrollees based on their health conditions and medical encounters. In a study of Medicaid enrollees across nine states, 15% received an influenza vaccine from July 2016 to June 2021. Vaccination rates rose annually, peaking during peri-COVID seasons. The highest uptake was among children (6 months-4 years), and the lowest was in adults (18-64 years). Female gender, urban residency, and Medicaid-managed care affiliation positively influenced uptake. However, mental health and substance abuse disorders decreased the likelihood. This study, reliant on Medicaid claims data, underscores the need for outreach services.

14.
NPJ Digit Med ; 6(1): 213, 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-37990134

RESUMO

Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N = 481) and a prospective test set (10/1/22-10/31/22, N = 102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13 min, compared to 9 h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.

16.
Microorganisms ; 11(10)2023 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-37894094

RESUMO

Staphylococcus argenteus is a novel Staphylococcus species derived from Staphylococcus aureus. Information on the prevalence and genetic characteristics of invasive S. argenteus in Asia is limited. In this study, 275 invasive S. aureus complex strains were retrieved from blood culture specimens in Hong Kong and re-analyzed using MALDI-TOF mass spectrometry and an in-house multiplex real-time PCR for S. argenteus. The prevalence of invasive S. argenteus in Hong Kong was found to be 4.0% (11/275). These strains were primarily susceptible to commonly used antibiotics, except penicillin. Whole-genome sequencing revealed the circulation of three S. argenteus genotypes (ST-2250, ST-1223, and ST-2854) in Hong Kong, with ST-2250 and ST-1223 being the predominant genotypes. The local ST-2250 and ST-1223 strains showed close phylogenetic relationships with isolates from mainland China. Antimicrobial-resistant genes (fosB, tet-38, mepA, blaI, blaZ) could be found in nearly all local S. argenteus strains. The ST-1223 and ST-2250 genotypes carried multiple staphylococcal enterotoxin genes that could cause food poisoning and toxic shock syndrome. The CRISPR/Cas locus was observed only in the ST-2250 strains. This study provides the first report on the molecular epidemiology of invasive S. argenteus in Hong Kong, and further analysis is needed to understand its transmission reservoir.

18.
JAMA Surg ; 158(12): 1349-1351, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37851462

RESUMO

This cohort study uses a deidentified national administrative claims database to assess the association of eligibility expansion with abdominal aortic aneurysm screening and diagnosis.


Assuntos
Aneurisma da Aorta Abdominal , Programas de Rastreamento , Humanos , Aneurisma da Aorta Abdominal/diagnóstico por imagem , Ultrassonografia
19.
J Biomed Inform ; 147: 104522, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37827476

RESUMO

OBJECTIVE: Audit logs in electronic health record (EHR) systems capture interactions of providers with clinical data. We determine if machine learning (ML) models trained using audit logs in conjunction with clinical data ("observational supervision") outperform ML models trained using clinical data alone in clinical outcome prediction tasks, and whether they are more robust to temporal distribution shifts in the data. MATERIALS AND METHODS: Using clinical and audit log data from Stanford Healthcare, we trained and evaluated various ML models including logistic regression, support vector machine (SVM) classifiers, neural networks, random forests, and gradient boosted machines (GBMs) on clinical EHR data, with and without audit logs for two clinical outcome prediction tasks: major adverse kidney events within 120 days of ICU admission (MAKE-120) in acute kidney injury (AKI) patients and 30-day readmission in acute stroke patients. We further tested the best performing models using patient data acquired during different time-intervals to evaluate the impact of temporal distribution shifts on model performance. RESULTS: Performance generally improved for all models when trained with clinical EHR data and audit log data compared with those trained with only clinical EHR data, with GBMs tending to have the overall best performance. GBMs trained with clinical EHR data and audit logs outperformed GBMs trained without audit logs in both clinical outcome prediction tasks: AUROC 0.88 (95% CI: 0.85-0.91) vs. 0.79 (95% CI: 0.77-0.81), respectively, for MAKE-120 prediction in AKI patients, and AUROC 0.74 (95% CI: 0.71-0.77) vs. 0.63 (95% CI: 0.62-0.64), respectively, for 30-day readmission prediction in acute stroke patients. The performance of GBM models trained using audit log and clinical data degraded less in later time-intervals than models trained using only clinical data. CONCLUSION: Observational supervision with audit logs improved the performance of ML models trained to predict important clinical outcomes in patients with AKI and acute stroke, and improved robustness to temporal distribution shifts.


Assuntos
Injúria Renal Aguda , Acidente Vascular Cerebral , Humanos , Registros Eletrônicos de Saúde , Hospitalização , Prognóstico
20.
J Am Med Inform Assoc ; 31(1): 188-197, 2023 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-37769323

RESUMO

OBJECTIVE: While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction. MATERIALS AND METHODS: We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost. RESULTS: The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables. DISCUSSION: We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes. CONCLUSION: Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.


Assuntos
Adenocarcinoma , Máquina de Vetores de Suporte , Humanos , Modelos Logísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...